home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
IRIX Patches 1995 March
/
SGI IRIX Patches 1995 Mar.iso
/
relnotes
/
patchSG0000254
/
ch1.z
/
ch1
Wrap
Text File
|
1995-03-10
|
32KB
|
793 lines
- 1 -
1. _R_e_l_e_a_s_e__N_o_t_e_s__f_o_r__I_R_I_X__5_._2__p_a_t_c_h__S_G_0_0_0_0_2_5_4
This release note describes patch SG0000254 to IRIX 5.2. It
contains the following information:
+o Hardware/software platforms supported
+o List of the bugs which are fixed by this patch
+o Compatability considerations
+o List of subsystems included in this patch
+o Installation instructions
1.1 _H_a_r_d_w_a_r_e__p_l_a_t_f_o_r_m_s__s_u_p_p_o_r_t_e_d
This patch to IRIX 5.2 supports the following machine types:
+o Challenge and Onyx with R4400 processors
+o Crimson (4D/510)
+o PowerSeries (4D/120, 4D/2xx, 4D/3xx and 4D/4xx)
1.2 _S_o_f_t_w_a_r_e__p_l_a_t_f_o_r_m_s__s_u_p_p_o_r_t_e_d
This patch will install on the following software platforms:
+o IRIX 5.2
+o IRIX 5.2 for 200MHz IP19 (Challenge or Onyx, excluding
Onyx Extreme)
1.3 _B_u_g_s__f_i_x_e_d__b_y__p_a_t_c_h__S_G_0_0_0_0_2_5_4
This patch contains fixes for the following problems which
exist in IRIX 5.2 (bug numbers from SGI bug tracking system
are included for reference):
+o Multiprocessor systems acting as NFS servers can crash
if multiple operations attempt to update the list of
exported NFS filesystems simultaneously (bug 141828).
+o It is possible for the system to loop forever printing
the message "CPU x WARNING:tlbmiss: invalid badvaddr
XXX". This could happen if there is an invalid
reference in the kernel to kernel heap space. When this
situation arose the system would have to be forcibly
rebooted. The kernel has now been corrected to panic
whenever such a situation arises so that the errant
- 2 -
code can be isolated (bug 189291).
+o When dbx is started on a user VME driver which maps in
VME bus memory via the mmap() system call, attempting
to print the contents of that memory using dbx crashes
the machine.
This was true for the Challenge/Onyx machines as well
as the older 4D series machines.
This bug is fixed in this patch release. It is now
possible for a user VME process to be run under dbx and
view the variables in VME memory using dbx primitives
like print. This allows users to examine the variables
as bytes, half-words, or words depending on the type of
VME memory.
Trying to print a structure whose size is greater than
the maximum access size supported by the VME board (in
the case of D16 it could be 2 bytes, for D32 it could
be 4 bytes, etc) still does NOT work. This is
primarily due to VME bus access being size sensitive. A
D16 board may only respond to 2 byte size access.
Trying any other size access could cause problems for
the board. As a result, when a user tries to print a
structure, the kernel does not know what size operation
is right for the VME address space in question. If the
size is more than 8 bytes, it returns an error. (bug
189318).
+o Lost clock interrupts on Power Series machines cause
time to drift under a heavy system load. This patch
provides a temporary workaround to the problem on
machines equipped with an IO3 board. Machines which do
not have an IO3 board installed will be uneffected by
this patch (bug 192233).
+o The _d_f(1) command can return a negative number as the
count of blocks used on the /_p_r_o_c file system under
some conditions (bug 193935).
+o The IO4 serial driver (only applicable on Challenge and
Onyx) has been modified for enhanced serial performance
at sustained high baudrate. The interrupt priority of
the duarts has been increased to prevent long interrupt
masking under heavy loads from causing the duart
hardware to drop incoming characters. As a side effect,
it is now possible to program the duarts to interrupt
less frequently, reducing the cost in cpu usage of
heavy serial traffic. Users should see better
performance at a lower cost.
- 3 -
Users are advised, however, that since the duart now
interrupts at a very high priority, it is now possible
to bring the entire machine to a halt by flooding it
with serial traffic on a large number of ports at
maximum baudrate. The machine may reach a state where
it spends 100% of its time handling serial interrupts.
Note that only the master cpu is actually tied up in
this fashion, but other cpus may also be tied up
waiting for the master cpu to release a needed
resource.
The high bandwidth capability and decreased cost are
nullified if the _d_u_a_r_t__r_s_r_v__d_u_r_a_t_i_o_n timeout variable
is configured to 0 at kernel build time, or if this
value is reset at runtime with the SIOC_ITIMER _i_o_c_t_l
command (see _s_e_r_i_a_l(7)). A 0 value in this case
indicates that the user wishes the smallest possible
latency receiving characters, and all of the tricks to
improve high baudrade performance entail some latency,
so they cannot be used in this case. The user can
expect a significant performance penalty when setting
this timeout to 0. (bug 200377)
+o There is a bug in the gang scheduler that prevents
priority from being observed between gangs on the same
queue. Additionally, the batch gang queue can
improperly run gangs even though the gang queue has
valid work in it. (bug 200394).
+o There is a bug in the code that keeps track of the IP
multicast addresses that a host is accepting. Systems
which use IP multicasting occasionally have some
multicast addresses deleted when they are still in use
or continue to listen for multicast addresses that are
no longer in use (bug 201283).
+o Multiprocessor Challenge and Onyx machines running IRIX
5.2 can hang as a result of a software deadlock (bug
204252).
+o The IRIX Extent File System code in 5.2 has the
property that files which are open and have been
extended since the last time they were closed are
likely to be lost when the system crashes for any
reason. Changes have been made to the file system code
in the kernel and to the file system check utility
(_f_s_c_k) to reduce significantly the amount of data that
is lost when the system crashes or loses power with
extended application files still open. Note that there
is still no guarantee that all writes done by
applications will be preserved across a system crash.
- 4 -
The file system buffers writes and commits the data to
disk asynchronously by design (bug 204253).
+o Extending a file by writing to it on an NFS mounted
file system was slower than it should have been because
of incorrect interactions between the NFS server code
and the file system on the server side (bug 204732).
+o MP protection was added around automounter updates to
/etc/mtab.
+o The automounter no longer attempts to mount over an
already mounted (root) child filesystem. The child
filesystem will be the root, if the client is unable to
mount the parent filesystem, due to permissions,
timeouts, etc. (bug 172695).
+o There is a race condition in the communication between
the kernel and the local lock manager that can cause
NFS to hang (bug 205438).
+o There is a race condition in NFS client handle
allocation that can cause NFS to hang under heavy loads
on the client (bug 205453).
+o On large memory systems, the kernel software previously
had no throttle mechanism on the use of kernel virtual
address space. Kernel virtual address space is used to
map the kernel and its control data structures.
Sometimes, EFS, NFS, raw I/O and other operations could
cause the O/S to consume too much kernel virtual space
to map file systems buffers. A new kernel variable
"bmappedpct", dynamically tuneable, has been added to
limit the % of "syssegsz" kernel virtual space allowed
to be used by the file system buffers. When that value
is exceeded, the system actively attempts to reclaim
virtual address space. As shipped, this patch sets this
value at 50%. Setting this tunable variable to "100"
(or 100%) effectively disables this new control (bug
205422).
+o Related to bug 205422, under some circumstances buffers
associated with a logical volume could remain mapped
for extended periods of time. Usually this would not
cause a problem, but in cases where there is a limited
amount of kernel virtual space available, this could
make troubles worse. The fix included in this patch
causes the logical volume driver to always unmap
buffers it maps (bug 250335).
- 5 -
+o The logical volume driver has a race condition between
the "open" and "ioctl" entry points. This patch
includes a fix that serializes opens and ioctls on
logical volumes. This fixes problems encountered by
running multiple mklv commands on the same LV as the
same time (bug 250334).
+o A race condition exists between a process exiting and
looking at that same process's credentials using the
/proc interface. The /proc interface attempts to look
at process's credentials after releasing a lock on the
process entry. If the process exits within a few
instructions the lock being released, then the /proc
support can use an invalid pointer and panic the
system. The solution in this patch holds the process
entry locked until all credential information is copied
(bug 249685).
+o The profiling clock was running continuously on all
processors, even when no profiling was in progress.
This bug affects Challenge and Onyx only (bug 206673).
+o Under certain loads, the system occasionally appears to
be idle for periods up to 90 seconds, even when there
are active jobs that should be running. This bug
affects Challenge and Onyx only (bugs 193082 and
207844).
+o Internet port numbers that can be automatically
assigned were limited to 5000. This has been increased
to 65535.
+o Several software deadlocks that can cause the system to
hang have been fixed (bug 208087).
+o The normal diagnostics run at system powerup leave some
error bits set in the hardware that were not being
completely cleared by the operating system at boot
time. This residual error state causes other hardware
errors to be misdiagnosed. The kernel boot code now
clears these error bits. This bug affects Challenge
and Onyx only (bug 209406).
+o There is an error in the system audit trail mechanism
that can cause the system to crash when handling
pathnames of certain formats (bug 212708).
+o There is an error in the gang scheduler that can, under
certain circumstances, cause a gang to starve if it has
a large number of processes associated with it (bug
214170).
- 6 -
+o Multiprocessing EVEREST systems failed to start all
processors for different software configurations. This
problem was most notable with a kernel linked for
debugging, and no symmon available at boot time.
Usually only the master CPU would boot, with all slaves
failing to start. The problem has also been seen on
non-debug versions of the kernel. The problem was
caused by a race condition between the master CPU and
the slaves during the early boot process. The
appropriate synchronization is implemented in this
patch (bug 214364).
+o Using a regular file in a file system as a
supplementary swap area can cause the system to crash
during heavy swapping (bug 214374).
+o The combination of heavy outbound network traffic using
large buffers (as is done by doing ftp puts, for
example) and heavy page aging by the virtual memory
system when free memory is low can cause a
multiprocessor system to hang in the page flipping code
(bug 216587).
+o The disk quotas facility did not work in the previous
IRIX 5.2 patches SG0000001 and SG0000022. This has
been fixed in this patch release.
+o There is a performance problem related to the creation
of sproc children that have local mappings that results
in excessive rfault rates for the child. This occurs
when the parent has the local mapping already in its
address space when sproc is called (bug 222221).
+o A particular POSIX conformance bug that caused the
updating of file access time for a read on a file on a
read only file system has been fixed. (bug 223286).
+o Another POSIX conformance bug whereby a fcntl to dup a
file descriptior a number of times so as to exceed the
user's allowable number of open file descriptors
returned error EMFILE has been changed to return
EINVAL. (bug 223492).
+o Still another POSIX conformance bug that has been
closed in this patch is the fact that if a process
which handles SIGCONT is sleeping inside a system call
at an interruptible priority, the sending of a job
control stop signal will interrupt the system call,
returning -1 and setting errno to EINTR. (bug 223509).
- 7 -
+o Sproc processes that exec may carry with them wrong
pages from parent; potentially causes the execed
process to hang (bug 227235).
+o Data base systems using asynchronous I/O could corrupt
data. However, only SYBASE version 10 is known to
trigger this problem (bug 229896).
+o Support for 4MB secondary cache systems in both IRIX
and IO4 prom.
+o The IO4 prom image also incorporates new segment loader
software, and multiple versions of the IO4 software for
different architectures. In particular, this version of
the IO4 prom supports both IP19 and IP21 CPU boards.
Some Scsi initialization and time out values were
changed to recover from the system attempting to boot
from a disk that is not "ready" yet.
+o The IO4prom did not allow booting from a SCSI disk that
was above address 7 on the SCSI bus (bug 240879).
+o A new feature was introduced in patchSG0000022 for
EVEREST systems only (Challenge and Onyx with R4400
processors). For a certain class of memory errors,
recovery is possible in software because the data lost
is no longer required. This feature was disabled by
default, but is now enabled in this patch. For
example, errors which occur when zeroing a new page for
a task may be safely ignored, since the previous data
on the page is no longer needed.
The kernel variable "ecc_recover_enable" enables and
disables this recovery feature. A value of 0 indicates
that recovery should not be attempted. A non-zero
value represents the number of seconds over which 32
error recovery attempts can be made. In general, a
value of 60 should be used to enable this feature.
This is the value that is now enabled by default.
+o A bug that occured as part of patch 33 where on EVEREST
machines the NOINTR directive could be ignored, causing
problems with real time latency has been corrected.
(bug 235061)
+o Power Series and Crimson systems with dual VME buses
did not support user mode access (/dev/vme) properly.
A16 mode on the second bus did not work, and A32 did
not work on either bus. These problems are corrected.
- 8 -
+o VME write error handling for Challenge/Onyx systems was
not taking care of corner cases where a VME write error
followed by a VME read error would cause the systems to
crash in certain situations (bug 231142).
+o Fixed the problem where stressing a Power series system
with multiple ethernets (et0, and enp0) would cause the
network subsystem to hang. This was also causing SCSI
subsystem to hang (bug 188296).
+o System calls stat() or xstat() on a tty file can hang
in kernel mode, leaving the process unkillable (bug
230375).
+o Multiprocessor systems with R4000 cpus or R4400 cpus at
revision level 2.2 or less and which use loadable
drivers can crash due to a kernel segmentation
violation. Such systems with loadable drivers can
panic with the cause being an RMISS and the bad_addr
not matching the faulting pc. The workaround installed
in the kernel detects this particular path when it is
caused by an R4000 bug and allows the operation to be
retried, which is needed to correct the problem (Bug
#236338).
+o Changes were put into the kernel which allow the kernel
stack to be increased by an additional page for real-
time processes and otherwise on an as-needed basis.
This increases the reliability of the system by
eliminating scenarios in which the kernel stack might
overflow and panic the system, which occasionally arose
in systems making heavy use of remote file systems, for
example (Bug #240710).
+o Fixed a problem where a binary compiled on IRIX 4.0.5
that uses libc function getcwd() will fail on an IRIX
5.2 machine that has a raid filesystem when the binary
is run on the raid filesystem (bug 234992).
+o Under certain circumstances, mail could experience a
deadlock in accessing its lock file in an nfs-mounted
mail directory. This fix makes it possible for users
to have nfs-mounted mail directories (bug 228720).
+o A user-written program which attempted to read /dev/mem
or /dev/kmem could cause the system to crash (bug
189764). This problem is now resolved.
+o Writing to a named pipe over nfs could cause a system
panic. This problem could occur running previous Irix
5.2 patches patchSG0000022, patchSG0000030,
- 9 -
patchSG0000033 or patchSG0000047, all of which are
replaced by this patch release.
+o The fuser command could cause a system panic when used
on a machine with heavy socket creation/deletion
activity (bug 209242).
+o Irix 5.2 patchSG0000047 introduced a problem whereby an
EFast ethernet board would not be seen on the VME bus
of a PowerSeries (4D/120, 4D/2xx, 4D/3xx and 4D/4xx).
That problem is fixed in this patch release.
+o When TCP connections are being created at a high rate,
a system panic may occur with message "soaccept
!NOFDREF" (bug 249206). This fix avoids the race
between accept() and tcp_drop().
+o When TCP connections are being created at a high rate,
connections may time out even though the server is
largely idle, due to the backlog limit on the server's
initial connection socket being limited to a small
value (bug 245976). This change allows the maximum
backlog value to be reconfigured, by modifying the
variable somaxconn in /var/sysgen/master.d/bsd.
+o When remote TCP clients disappear forever (where the
client systems do not respond to pings), with
connections open and data queued for output, after the
local server has closed the connection, but before all
the data has been delivered and acknowledged, the TCP
socket is left in the kernel indefinitely, even if the
server set the SO_KEEPALIVE option (bug 248935). This
eventually uses up all available network buffer space.
This change adds a new kernel variable,
tcp_keep_timer_in_close, which may be set to a non-zero
value to permit SO_KEEPALIVE timeouts to act on such
sockets. The variable must presently be set using dbx
or some other such program which permits modification
of kernel variables.
+o The system accounting programs sar(1) and sadc(1M) fail
(dump core) when the system contains a large number of
disk partitions (bug 214394). This patch removes
limitations in these programs on the number of disk
partitions configured on a system.
+o Heavy disk usage involving 2 or more processes
repeatedly reading the same section of disk, or the
same file, could cause extremely show response for any
other process. A scheduling change has been made that
prevents such processes from effectively blocking out
- 10 -
other running processes (bug 237460).
+o A kernel scheduler panic could happen in a specific
case of schedctl() changing a job priority from below
80 to above 80. This problem has been fixed by adding
retry logic in pugDuty() (bug 240971).
+o On an MP system, a process can change the priority of
another process which could be running on another CPU.
This bug fix will allow it to do so without running
into any kernel stack extension page mismatches (bug
252308).
+o Temporary workaround for PROMs - current PROMS do not
support port numbers with the sign bit set. The
workaround is to limit the port numbers of anonymous
connections to 32767. Problem was originally found in
using tftpd, when tftp could not be used on a 5.3
server (bug 231136).
1.4 _C_o_m_p_a_t_a_b_i_l_i_t_y__c_o_n_s_i_d_e_r_a_t_i_o_n_s
This patch includes slight content changes to the following
system header files as a part of the fix to prevent any
possible kernel stack overlow:
+o "/usr/include/sys/param.h"
+o "/usr/include/sys/proc.h"
These changes were made in a way to minimize compatibility
concerns, but it is still possible that software sensitive
to the exact kernel proc struct may need to be rebuilt.
For this reason, sites using CASEVision(m/ClearCase must
rebuild the MFS (MVFS for ClearCase 2.0 users) after
installing patchSG0000125, patchSG0000139 or patchSG0000254
. If these steps were taken after installing patchSG0000125
or patchSG0000139, they do not have to be repeated after
installing patchSG0000254. As the root user, execute the
following instructions and then reboot. The CPUBOARD value
IPxx may be determined from the _h_i_n_v(1M) command.
If you are running ClearCase 1.1.4:
%%%% ssssuuuu
#### sssseeeetttteeeennnnvvvv CCCCPPPPUUUUBBBBOOOOAAAARRRRDDDD IIIIPPPPxxxxxxxx
#### ccccdddd ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////bbbbooooooootttt
#### mmmmaaaakkkkeeee ----ffff ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////MMMMaaaakkkkeeeeffffiiiilllleeee....kkkkeeeerrrrnnnniiiioooo mmmmffffssss____ppppaaaarrrraaaammmm....oooo
#### mmmmvvvv mmmmffffssss....oooo mmmmffffssss....oooo....oooolllldddd
#### lllldddd ----oooo mmmmffffssss....oooo ----rrrr pppprrrreeeemmmmffffssss....oooo mmmmffffssss____ppppaaaarrrraaaammmm....oooo
#### ////eeeettttcccc////aaaauuuuttttooooccccoooonnnnffffiiiigggg ----ffff
- 11 -
If you are running ClearCase 2.0:
%%%% ssssuuuu
#### sssseeeetttteeeennnnvvvv CCCCPPPPUUUUBBBBOOOOAAAARRRRDDDD IIIIPPPPxxxxxxxx
#### ccccdddd ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////bbbbooooooootttt
#### mmmmaaaakkkkeeee ----ffff ////vvvvaaaarrrr////ssssyyyyssssggggeeeennnn////MMMMaaaakkkkeeeeffffiiiilllleeee....kkkkeeeerrrrnnnniiiioooo mmmmvvvvffffssss____ppppaaaarrrraaaammmm....oooo
#### mmmmvvvv mmmmvvvvffffssss....oooo mmmmvvvvffffssss....oooo....oooolllldddd
#### lllldddd ----oooo mmmmvvvvffffssss....oooo ----rrrr pppprrrreeeemmmmvvvvffffssss....oooo mmmmvvvvffffssss____ppppaaaarrrraaaammmm....oooo
#### ////eeeettttcccc////aaaauuuuttttooooccccoooonnnnffffiiiigggg ----ffff
Note: If you remove patchSG0000125, patchSG0000139 or
patchSG0000254, you need to perform these same steps.
1.5 _S_u_b_s_y_s_t_e_m_s__i_n_c_l_u_d_e_d__i_n__p_a_t_c_h__S_G_0_0_0_0_2_5_4
This patch includes changes to the following IRIX 5.2
products: _c_o_m_p_i_l_e_r__d_e_v, _e_o_e_1, _e_o_e_2, _n_f_s and _d_e_v. The
patchSG0000254 image contains the following subsystems:
+o patchSG0000254.compiler_dev_sw.dbx
+o patchSG0000254.dev_hdr.lib
+o patchSG0000254.eoe1_sw.quotas
+o patchSG0000254.eoe1_sw.unix
+o patchSG0000254.eoe2_sw.audit
+o patchSG0000254.eoe2_sw.kdebug
+o patchSG0000254.eoe2_sw.perf
+o patchSG0000254.nfs_sw.nfs
1.6 _I_n_s_t_a_l_l_a_t_i_o_n__i_n_s_t_r_u_c_t_i_o_n_s
This patch is only installable on systems running IRIX 5.2.
This patch requires installation in miniroot mode. To
perform the installation, take the system down and follow
the normal procedures for starting up the installation tool
from the supplied release media. It is recommended that you
select all the patch subsystems that correspond to software
already installed on the system.
This patch will install on systems running IRIX 5.2, or on
Challenge or Onyx systems with the 5.2-200MHz release
installed to support IP19 200MHz CPU boards. In the case of
installing on the 5.2-200MHz release, inst will note an
apparent version mismatch for the subsystem
patchSG0000254.eoe1_sw.unix, as noted by:
- 12 -
kkkk NNNN ppppaaaattttcccchhhhSSSSGGGG0000000000000000222255554444....eeeeooooeeee1111____sssswwww....uuuunnnniiiixxxx @@@@ 0000 11114444666666665555++++ IIIIRRRRIIIIXXXX EEEExxxxeeeeccccuuuuttttiiiioooonnnn EEEEnnnnvvvviiiirrrroooonnnnmmmmeeeennnntttt
For correct installation of patchSG0000254, it is necessary
to issue the following inst command:
sssseeeetttt nnnneeeewwwweeeerrrroooovvvveeeerrrrrrrriiiiddddeeee oooonnnn
in order to force inst to install
patchSG0000254.eoe1_sw.unix.
One way in which software patches differ from full releases
and maintenance releases is that patches are reversible:
you can remove the patch and restore the installed software
to its state before the patch was applied. This is done by
using the _v_e_r_s_i_o_n_s command as superuser:
vvvveeeerrrrssssiiiioooonnnnssss rrrreeeemmmmoooovvvveeee ppppaaaattttcccchhhhSSSSGGGG0000000000000000222255554444
Since this patch replaces some kernel object files, it is
necessary to rebuild the kernel image and reboot after
removing the patch:
aaaauuuuttttooooccccoooonnnnffffiiiigggg
rrrreeeebbbbooooooootttt